Open topic with navigation
Appendix D: Regular Expressions
Regular Expressions are patterns used to match character combinations in strings, for example a phone number or email address can be checked against a Regular Expression pattern to validate they conform to a certain format.
Regular Expressions use a definition language where characters have special meaning. There are many special characters, the ones used in the examples that follow are:
- Square brackets [], to specify the range of characters that are allowed.
- Curly brackets {}, to specify the number of times the previous pattern must match.
- Caret ^, to specify the pattern match must occur at the beginning of the string.
- Dollar sign $, to specify that the pattern match must occur at the end of the string
- Using ^ and $ specifies that the whole input string must match the pattern.
Patient Identifier Examples for SilhouetteConnect
Example A
A 5 digit patient number would be written as:
- ^[0-9]{5}$
- Values satisfying this regular expression include 12345 and 75319
Example B
A 3 digit unit ID, a dash, then a 3 digit patient number would be written as:
- ^[0-9]{3}-[0-9]{3}$
- Values satisfying this regular expression include 147-963 and 456-321
Example C
Two upper-case alphabetic character unit ID, then 4 digit patient number would be written as:
- ^[A-Z]{2}[0-9]{4}$
- Values satisfying this regular expression include ZK1648 and VT7642
Example D
A three-digit unit ID, a dash, a patient number comprising 4, 5 or 6 numeric digits, and the last character being an upper-case A, B, or C to indicate three different study groups would be written as:
- ^[0-9]{3}-[0-9]{4, 6}[ABC]{1}$
- Values satisfying this regular expression include 010-01934A and 110-50012C
Through the simple use of regular expressions you can control the accuracy of information entered into a system and hence improve reliability and reduce errors.
Further reading and an interactive tutorial can be found here: http://regexone.com/
Using Regular Expressions the following types of structures can be specified:
- A character from a group of characters, e.g.:
- A lower-case alphabetic character [a-z]
- An upper-or-lower case alphabetic character [a-zA-Z]
- A numeric digit [0-9]
- Alphanumeric characters comprising lower-case alphabetic or numeric digits [0-9a-z]
- Commonly occurring punctuation characters such as dash “-“, comma “,”, period “.”, brackets “[“, and braces “(“, etc.
- A character can occur a fixed number of times, e.g.:
Three characters which can be either a, b or c would be written as: [abc]{3}
-
A single optional character is indicated by the question mark “?”, e.g.:
An optional uppercase alphabetic character would be written as: [A-Z]?
- 1 upper-case alphabetic or 2 numeric digits could be required using the special or operator "|":
such as: [A-Z]|[0-9]{2}
Things to Note
- The special treatment of the “(“ and “)” characters –these have special meaning within Regular Expression language so when they need to be used as normal characters this needs to be indicated. Preceding special characters with the “\” (“slash”) character tells the computer to treat the character as regular text - this is referred to as “escaping” a character. Other characters that may need escaping include “[“, “]”, even the slash itself requires escaping using a slash. “\\”.
- The dash “-“ character generally doesn’t require escaping, although it does have special meaning when used to designate a character range within the square brackets. If a dash is used outside a character range statement then it will be interpreted as a normal character.
- If you want to ensure the regular expression only matches if the pattern covers the entire input you need to explicitly specify this by using ^ and $, which mean match the start of the input and end of the input respectively.
- Regular Expressions can become very complex and testing them carefully to ensure they both accept and reject the correct input is important. It is also recommended to test the regular expressions on all applications used for data entry (i.e. both SilhouetteCentral and SilhouetteConnect).